Self-supervised Regularization for Text Classification

نویسندگان

چکیده

Abstract Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training models limited, which renders these prone to overfitting. To address this problem, we propose SSL-Reg, data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., 2019a) an unsupervised that defines auxiliary tasks input data without using any human-provided labels learns representations by solving tasks. supervised task are performed simultaneously. The unsupervised, defined purely human- provided labels. Training model can prevent from being overfitted limited class in task. Experiments 17 text datasets demonstrate effectiveness our proposed method. Code available at https://github.com/UCSD-AI4H/SSReg.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised learning for text classification using feature affinity regularization

Most conventional semi-supervised learning methods attempt to directly include unlabeled data into training objectives. This paper presents an alternative approach that learns feature affinity information from unlabeled data, which is incorporated into the training objective as regularization of a maximum entropy model. The regularization favors models for which correlated features have similar...

متن کامل

Soft-Supervised Learning for Text Classification

We propose a new graph-based semisupervised learning (SSL) algorithm and demonstrate its application to document categorization. Each document is represented by a vertex within a weighted undirected graph and our proposed framework minimizes the weighted Kullback-Leibler divergence between distributions that encode the class membership probabilities of each vertex. The proposed objective is con...

متن کامل

Variational Autoencoder for Semi-Supervised Text Classification

Although semi-supervised variational autoencoder (SemiVAE) works in image classification task, it fails in text classification task if using vanilla LSTM as its decoder. From a perspective of reinforcement learning, it is verified that the decoder’s capability to distinguish between different categorical labels is essential. Therefore, Semi-supervised Sequential Variational Autoencoder (SSVAE) ...

متن کامل

Sprinkling Topics for Weakly Supervised Text Classification

Supervised text classification algorithms require a large number of documents labeled by humans, that involve a laborintensive and time consuming process. In this paper, we propose a weakly supervised algorithm in which supervision comes in the form of labeling of Latent Dirichlet Allocation (LDA) topics. We then use this weak supervision to “sprinkle” artificial words to the training documents...

متن کامل

A Supervised Clustering Method for Text Classification

This paper describes a supervised three-tier clustering method for classifying students’ essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students’ essay statements into principles and misconceptions of physics. A simple `bag-of-words’ representation using a naïve-bayes algorithm to categorize text was un...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2021

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00389